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Abstract 

The NAS Parallel Benchmarks (NPB) were developed in 1991 at NASA Ames Research 
Center to study the performance of parallel supercomputers. The eight benchmark problems are 
specified in a “pencil and paper” fashion, i.e., the complete details of the problem are given in a 
NAS technical document. Except for a few restrictions, benchmark implementors are free to select 
the language constructs and implementation techniques best suited for a particular system. In this 
paper, we present new NPB performance results for the following systems: 

(a) Parallel- Vector Processors: CRAY C90, CRAY T90, and Fujitsu VPP500; 

(b) Highly Parallel Processors: CRAY T3D, IBM SP2-WN (Wide Nodes), and IBM SP2-TN2 
(Thin Nodes 2); 

(c) Symmetric Multiprocessors: Convex Exemplar SPP1000, CRAY J90, DEC Alpha Server 
8400 5/300, and SGI Power Challenge XL (75 MHz). 

We also present sustained performance per dollar for Class B LU, SP and BT benchmarks. We 
also mention future NAS plans for the NPB. 
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1: Introduction 

The Numerical Aerodynamic Simulation (NAS) Program, located at NASA Ames Research 
Center, is a pathfinder in high-performance computing for NASA and is dedicated to advancing 
the science of computational aerodynamics. One key goal of the NAS organization is to 
demonstrate by the year 2000 an operational computing system capable of simulating an entire 
aerospace vehicle system in one to several hours. It is currently projected that the solution of this 
Grand Challenge problem will require a system that can perform scientific computations at a 
sustained rate of approximately 1000 times faster than 1990 generation supercomputers. Such a 
computer system will most likely employ hundreds or even thousands of powerful RISC 
processors operating in parallel. 

In order to objectively measure the performance of various highly parallel computer systems 
and to compare them with conventional supercomputers, NAS has developed the NAS Parallel 
Benchmarks (NPB) [1,2]. Note that the NPB are distinct from the NAS High Speed Processor 
(HSP) benchmarks and procurements. The HSP benchmarks are used for evaluating production 
supercomputers for procurements in the NAS organization, whereas the NPB are used for 
studying highly parallel processor (HPP) systems in general. 

2: NAS Parallel Benchmarks 

The NPB consist of a set of eight benchmark problems, each of which focuses on some 
important aspect of highly parallel supercomputing for aerophysics applications. Some extension 
of Fortran or C is required for implementations, and reasonable limits are placed on the use of 
assembly code and the like. Otherwise, programmers are free to utilize language constructs that 
maximize performance on the particular system being studied. The choice of data structures, 
processor allocation, and memory usage are generally left open to the discretion of the 
implemented 

The eight problems consist of five kernels and three simulated computational fluid dynamics 
(CFD) applications. The five kernels comprise relatively compact problems, each emphasizing a 
particular type of numerical computation. Compared with the simulated CFD applications, they 
can be implemented fairly readily and provide insight as to the general levels of performance that 
can be expected on these specific types of numerical computations. 

The simulated CFD applications, on the other hand, usually require more effort to implement, 
but they are more representative of the types of actual data movement and computation required 
in state-of-the-art CFD application codes. For example, in an isolated kernel, a certain data 
structure may be very efficient on a certain system; and yet, this data structure may be 
inappropriate if incorporated into a larger application. By comparison, the simulated CFD 
applications require data structures and implementation techniques that are more typical of real 
CFD applications. 

(Space does not permit a complete description of these benchmark problems. A more detailed 
description of these benchmarks, together with the rules and restrictions associated with them, is 
given in Reference 2.) 

Sample Fortran programs implementing the NPB on a single-processor system are available to 
aid implemented. These programs, as well as the benchmark document itself, are available by 
mail from: NAS Systems Division, Mail Stop 258-6, NASA Ames Research Center, Moffett 
Field, CA 94035, Atm: NAS Parallel Benchmark Codes. Or send an e-mail to: 
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bm-codes(anas . nasa . gov, or access the World Wide Web at URL: 
http : //www . nas . nasa . gov/NAS/NPB/ sof tware/npb- software . html 

There are now two standard sizes for the NAS Parallel Benchmarks: Class A and Class B size 
problems. The nominal benchmark sizes for Class A and Class B problems are shown in Table 1. 
These tables also give the standard floating point operation (flop) counts. We recommend that 
those wishing to compute performance rates in millions of floating point operations per second 
(Mflop/s) use these standard flop counts. The tables contains Mflop/s rates calculated in this 
manner for the current fastest implementation on one processor of CRAY Y-MP for Class A and 
on one processor of CRAY C90 for Class B. Note, however, that in Tables 2 through 9, 
performance rates are not cited in Mflop/s; instead we present, the wall clock times (and, the 
equivalent performance ratios). We suggest that these, not Mflop/s, be examined when comparing 
different systems and implementations. 

With the exception of the IS benchmark, these standard flop counts were determined by using 
the hardware performance monitor on the CRAY Y-MP or CRAY C90, and we believe that they 
are close to the minimal counts required for these problems. In the case of the IS benchmark, 
which does not involve floating-point operations, we selected a value approximately equal to the 
number of integer operations required, in order to permit the computation of performance rates 
analogous to Mflop/s rates. We reserve the right to change these standard flop counts in the future, 
if necessary. 

The NAS organization reserves the right to verify any NPB results that are submitted to us. We 
may, for example, attempt to run the submitter’s code on another system of the same configuration 
as that used by the submitter. In those instances where we are unable to reproduce the vendor’s 
supplied results (allowing a 5% tolerance), our policy is to alert the submitter of the discrepancy 
and allow submitter to resolve the discrepancy in the next release of this report. If the discrepancy 
is not resolved to our satisfaction, then our own observed results and not the submitter’s results 
will be reported. This policy will apply to all results NAS receives and publishes. 

3: Benchmark Changes 

Because the benchmarks are specified in only a “pencil and paper” fashion, it is inevitable that 
loopholes develop whereby the benchmark rules are not violated but the benchmark intent is 
defeated. Some changes have been made in Embarrassingly Parallel (EP) and Conjugate (CG) 
benchmark specifications in order to close some loopholes that have developed with these kernels 
[3]. 

4: NAS Parallel Benchmark Results 

In the following section, each of the eight benchmarks will be briefly described, and then the 
best performance results we have received to date for each computer system will be given in 
Tables 2 through 9. These tables include run times and performance ratios. The performance ratios 
compare individual timings with the current best time for that benchmark achieved on one 
processor of CRAY Y-MP for Cass A and on one processor of CRAY C90 for Class B. The run 
times in each case are elapsed time measured in accordance with the specifications of NPB rules. 
This paper reports benchmark results on the following systems: Convex Exemplar SPP1000 by 
CONVEX Computer Corporation; CRAY C90, CRAY J90, T3D, CRAY T90, CRAY Y-MP by 
Cray Research Inc. (CRI); DEC Alpha Server 8400 5/300 by Digital Equipment Corporation; 
IBM SP2-WN and IBM SP2-TN2 by International Business Machines (IBM); Fujitsu VPP500 by 
Fujitsu America Inc.; Power Challenge XL (75 MHz) by Silicon Graphics Inc. 
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This paper includes a number of new results including previously unpublished Convex 
Exemplar SPP1000, CRAY C90, CRAY J90, CRAY T3D, CRAY T90, DEC Alpha Server 
8400 5/300, IBM SP2 and IBM SP-TN2 results. The benchmark results are presented under two 
classes: Kernels and CFD Applications. 

Table 1: Standard operation counts for the NPB. 


Benchmark Name 

Abb. 

Class A 

Class B 

Nominal 

Size 

Operation 

Count 

(xlO 9 ) 

Mfk>p/s 

CRAY 

Y-MP/1 

Nomina) 

Size 

Operation 

Count 

(xlO*) 

M flop/s 
CRAY 
C90/1 

Embarrassingly Parallel 

EP 

2 28 

26.68 

211 

2 30 

100.9 

543 

Multigrid 

MG 

256 3 

3.905 

176 

256 3 

18.81 

498 

Conjugate 

CG 

14 x 10 3 

1.508 

127 

75 x 10 3 

54.89 

447 

3-D FFTPDE 

FT 

256 2 x 128 

5.631 

196 

512 x 256 2 

71.37 

560 

Integer Sort 

IS 

2 23 x 2 19 

0.7812 

68 

2 25 x2 21 

3.150 

244 

LU Simulated CFD Application 

LU 

64 3 

64.57 

194 

102 3 

319.6 

493 

SP Simulated CFD Application 

SP 

64 3 

102.0 

216 

102 3 

447.1 

627 

BT Simulated CFD Application 

BT 

64 3 

181.3 

229 

102 3 

721.5 

572 


4.1: Kernels 

The results for five kernels (EP, MG, CG, FT, and IS) are given below in the following section: 
4.1.1: The Embarrassingly Parallel (EP) Benchmark 

The first of the five kernel benchmarks is an embarrassingly parallel problem. In this 
benchmark, two-dimensional statistics are accumulated from a large number of Gaussian pseudo- 
random numbers, which are generated according to a particular scheme that is well-suited for 
parallel computation. This problem is typical of many Monte Carlo applications. Since it requires 
almost no communication, in some sense this benchmark provides an estimate of the upper 
achievable limits for floating point performance on a particular system. Results for EP benchmark 
are given in Table 2. 

4.1.2: Multigrid (MG) Benchmark 

The second kernel benchmark is a simplified multigrid kernel, which solves a 3-D Poisson 
PDE. This problem is simplified in the sense that it has constant rather than variable coefficients 
as in a more realistic application. This code is a good test of both short and long distance highly 
structured communication. The Class B problem uses the same size grid as of Class A but a 
greater number of inner loop iterations. Results for this benchmark are shown in Table 3. 

4.1.3: Conjugate Gradient (CG) Benchmark 

In this benchmark, a conjugate gradient method is used to compute an approximation to the 
smallest eigenvalue of a large, sparse, symmetric positive definite matrix. This kernel is typical of 
unstructured grid computations in that it tests irregular long-distance communication and employs 
sparse matrix vector multiplication. Results are shown in Thble 4. 
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4.1.4: 3-D FFT PDE (FT) Benchmark 

In this benchmark a 3-D partial differential equation is solved using FFTs. This kernel performs 
the essence of many spectral methods. It is a good test of long-distance communication 
performance. The rules of the NPB specify that assembly-coded, library routines may be used to 
perform matrix multiplication and one-dimensional, two-dimensional, or three-dimensional FFTs. 
Thus this benchmark is somewhat unique in that computational library routines may be legally 
employed. Results are shown in Table 5. 

4.1.5: Integer Sort (IS) Benchmark 

This benchmark tests a sorting operation that is important in particle method codes. This type 
of application is similar to particle-in-cell applications of physics, wherein particles are assigned 
to cells and may drift out. The sorting operation is used to reassign particles to the appropriate 
cells. This benchmark tests both integer computation speed and communication performance. 
This problem is unique in that floating point arithmetic is not involved. Significant data 
communication, however, is required. Results are shown in Table 6. 

4.2: Simulated CFD Application Benchmarks 

The three simulated CFD application benchmarks are intended to accurately represent the 
principal computational and data movement requirements of modem CFD applications. 

4.2.1: LU Simulated CFD Application (LU) Benchmark 

The first of these is the so-called the lower-upper diagonal (LU) benchmark. It does not perform 
a LU factorization but instead employs a symmetric successive over-relaxation (SSOR) numerical 
scheme to solve a regular-sparse, block 5x5 lower and upper triangular system. This problem 
represents the computations associated with a newer class of implicit CFD algorithms, typified at 
NASA Ames by the code INS3D-LU. This problem exhibits a somewhat limited amount of 
parallelism compared to the next two benchmarks. A complete solution of the LU benchmark 
requires 250 iterations. Results are given in Table 7. 

4.2.2: SP Simulated CFD Application (SP) Benchmark 

The second simulated CFD application is called the scalar pentadiagonal (SP) benchmark. In 
this benchmark, multiple independent systems of nondiagonally dominant, scalar pentadiagonal 
equations are solved. A complete solution of the SP benchmark requires 400 iteration. Results are 
given in Table 8. 

4.2.3: BT Simulated CFD Application (BT) Benchmark 

The third simulated CFD application is called the block tridiagonal (BT) benchmark. In this 
benchmark, multiple independent systems of non-diagonally dominant, block tridiagonal 
equations with a 5x5 block size are solved. 

SP and BT are representative of computations associated with the implicit operators of CFD 
codes such as ARC3D at NASA Ames. SP and BT are similar in many respects, but there is a 
fundamental difference with respect to the communication to computation ratio. Timings are cited 
as complete run times, in seconds, as with the other benchmarks. For the BT benchmark, 200 
iterations are required. Results of BT benchmark are given in Table 9. 

5: Sustained Performance Per Dollar 

One aspect of the relative performance of these systems has not been addressed so far, namely 
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the differences in price between these systems. One should not be too surprised that the CRAY 
C90 system, for example, exhibits superior performance rates on these benchmarks, since its 
current list price is much greater than that of the other systems tested. 

One way to compensate for these price differences is to compute sustained performance per 
million dollars, i.e. the performance ratio figures shown in Tables 2 through 9 divided by the list 
price in millions. Some figures of this type are shown in Tables 10-12 for Class B LU, SP, and BT 
benchmarks, respectively. The table includes the list price of the minimal system (in terms of 
memory per node and number of processors) required to run the full Class B size NPB as 
implemented by the vendor. These prices were provided by the vendors and include any 
associated software costs, i.e. operating system, compilers, scientific libraries as requir ed, etc. but 
do not include maintenance. Note that some vendors’ standard configurations may include 
substantially more hardware than required for the benchmarks, e.g., the IBM SP2). Finally, be 
aware that list prices are similar to the peak performance in that they are guaranteed not to be 
exceeded. 

6: Observations and Comments 

1. The Parallel- Vector Processor CRAY C90 is no longer the performance leader. The absolute 
performance of three CFD applications benchmarks LU, SP, and BT on 512 PEs of CRAY T3D 
and 160 nodes of IBM SP2-WN is significantly greater than on the 16 CPUs of Cray C90. 

2. When the system performance is normalized by system price, all the highly parallel systems 
outperform the CRAY C90. 

3. Portability of the NPB is a big issue. Each vendor uses its own programming paradigm for 
parallelization [4], for example: 

a. Convex SPP 1000: Convex specific directives for achieving parallelization. 

b. CRAY C90: Cray-specific directives (Microtasking and Autotasking). 

c. CRAY J90: Cray-specific directives. 

d. CRAY T3D: Explicit shared-memory model using shmem_get and shmem_put. 

This paradigm is not a message-passing paradigm. 

e. Fujitsu VPP500: Fujitsu-specific parallel directives. 

f. IBM SP2-WN and IBM SP2-TN2: IBM-specific message-passing library called MPL. 

g. SGI PC-XL (75 MHz) : SGI-specific directives. 

4. To date no vendor has implemented NPB in Message Passing Interface (MPI) or High 
Performance Fortran (HPF). We recommend that vendors use either HPF or MPI for running 
NPB on their machines. 

5. NAS is writing NPB in HPF and MPI. We hope to announce these at Supercomputing '95 in 
San Diego. 

6. NAS is also upgrading existing NPB to include unstructured grids and multidisciplinary fields 
(coupling of fluids dynamics, structural mechanics, etc.) which will be announced/released at 
Supercomputing '96. 

7. The best computer based on performance per dollar for Gass B SP and BT benchmarks is a 
Symmetric Multiprocessor (SMP) machine called DEC Alpha Server 8400 5/300 (also called 
Turbo Laser) from Digital Equipment Corporation. The peak performance of a single processor 
used in this SMP is 600 Mflop/s. 
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Table 2: Results of the Embarrassingly Parallel (EP) benchmark. 


Computer System 


Convex Exemplar 
SPP1000 


Date 

Received 





IBM SP2-WN 
(Wide Nodes) 


IBM SP2-TN2 
(Thin Nodes 2) 


Silicon Graphics 
Power Challenge XL 
(75 MHz) 



CRAY J90 

Feb 

95 

1 

169.44 




2 

86.70 




4 

43.09 




8 

21.54 

CRAY T3D 

Feb 

95 

16 

22.74 




32 

11.37 




64 

5.68 




128 

2.87 




256 

1.44 




512 

0.72 




1024 

0.55 

CRAY T90 

Feb 

95 

1 

18.56 
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Table 3: Results of the Multigrid (MG) benchmark. 


Computer System 

Date 

Received 


Class A 

Class B 

No. 

Proc 

Time in 
Seconds 

Ratio to 
CRAY 
Y-MP/1 

Time in 
Seconds 

Ratio to 
CRAY 
C90/1 

Convex Exemplar 

Mar 95 

1 

208.0 

0.11 

NA 

NA 

SPP1000 


8 

29.9 

0.74 

150.4 

0.22 



16 

17.3 

1.28 

85.1 

0.40 



32 

11.0 

2.02 

52.7 

0.64 



64 

NA 

NA 

39.6 

0.85 

CRAY C90 

Feb 95 

1 

7.27 

3.06 

33.78 

1.00 



2 

3.71 

5.99 

17.24 

1.96 



4 

1.92 

11.58 

8.89 

3.80 



8 

1.10 

20.20 

4.59 

7.36 



16 

0.71 

31.30 

3.43 

9.85 

CRAY J90 

Feb 95 

1 

39.08 

0.57 


NA 



2 

20.52 

1.09 


NA 



4 

10.75 

2.07 


NA 



8 

6.14 

3.62 


NA 

CRAY T3D 

Feb 95 

16 

mem 


66.58 

0.51 



32 



30.10 

1.11 



64 

2.61 

8.51 

12.56 

2.69 



128 

1.36 

16.34 

6.57 

5.14 



256 

0.74 

30.03 

3.37 

10.02 



512 

0.39 

56.97 

1.74 

19.41 



1024 

0.25 

88.88 

1.15 

29.38 

CRAY T90 

Feb 95 

1 

4.57 

4.86 

NA 

NA 

CRAY Y-MP 

Aug 92 

1 

22.22 

1.00 

NA 

NA 



8 

2.96 

7.51 

NA 

NA 

Fujitsu VPP500 

Mar 95 

4 

1.44 

15.43 

6.81 

4.96 



8 

0.75 

29.63 

3.59 

9.41 



16 

0.42 

52.90 

2.01 

16.81 



32 

0.26 

85.46 

1.26 

26.81 

IBM SP2-WN 

Oct 94 

8 

6.04 


27.92 

1.21 

(Wide Nodes) 


16 

3.17 


14.58 

2.32 



32 

1.69 

13.15 

7.72 

4.38 



64 

0.95 


4.36 

7.75 



128 

0.53 


2.46 

13.73 

IBM SP2-TN2 

Feb 95 

8 

7.18 

3.09 

32.73 

1.04 

(Thin Nodes 2) 


16 

3.74 

5.94 

17.13 

1.97 



32 

1.99 

11.17 

9.14 

3.96 



64 

1.12 

19.84 

5.20 

6.50 



128 

0.63 

35.27 

2.95 

11.45 
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Table 4: Results of the Conjugate Gradient (CG) benchmark. 


Computer System 

Date 

Received 

No. 

Proc. 

Class A 

Class B 

Time in 
Seconds 

Ratio to 
Cray Y-MP/1 

Time in 
Seconds 

Ratio to 
Cray C90/1 

Convex Exemplar 

Mar 95 

1 

202.9 

0.06 

NA 


SPP1000 


8 

22.2 

0.54 

NA 




16 

8.94 

1.33 

837.0 




32 

4.30 

2.77 

485.4 




64 

NA 

NA 

292.1 


CRAY C90 

Feb 95 

i 

3.43 

3.48 

122.90 

1.00 



2 

1.79 

6.66 

63.11 

1.95 



4 

0.95 

12.55 

33.25 

3.70 



8 

0.53 

22.49 

18.11 

6.79 



16 

0.34 

35.06 

10.61 

11.58 

CRAY J90 

Feb 95 

1 

15.93 

0.75 

NA 

NA 



2 

8.42 

1.42 

NA 

NA 



4 

4.42 

2.70 

NA 

NA 



8 

2.61 

4.57 

NA 

NA 

CRAY T3D 

Feb 95 

16 

14.37 

0.83 

570.11 




32 

7.44 

1.60 

291.30 




64 

3.93 

3.03 

158.81 

0.77 



128 

2.11 

5.65 

82.07 

1.50 



256 

1.21 

9.85 

47.15 

2.61 



512 

0.72 

16.56 

27.34 

4.50 



1024 

0.58 

20.6 

16.58 

7.41 

CRAY T90 

Feb 95 

1 

1.955 

6.10 

NA 

NA 

CRAY Y-MP 

Aug 92 

1 

11.92 

1.00 

NA 

NA 



8 

2.38 

5.01 

NA 

NA 

Fujitsu VPP500 

Aug 94 

1 

5.68 

2.10 

NA 

NA 



2 

3.06 

3.90 

104.51 

1.18 



4 

1.72 

6.93 

55.40 

2.22 



8 

1.04 

11.46 

31.80 

3.86 



15 

NA 

NA 

20.85 

5.89 



16 

0.80 

14.90 

NA 

NA 



30 

j 

NA 

NA 

15.21 

8.08 

IBM SP2-WN 

Mar 94 

8 

4.91 

2.43 

156.21 

0.79 

(Wide Nodes) 


16 

3.09 

3.86 

88.4 

1.39 



32 

2.09 

5.70 

52.53 

2.34 



64 

1.6 

7.45 

33.79 

3.64 



128 

1.38 

8.64 

25.44 

4.83 

IBM SP2-TN2 

Mar 95 

8 

5.60 

2.13 

234.46 

0.52 

(Thin Nodes 2) 


16 

3.48 

3.43 

120.23 

1.02 



32 

2.34 

5.09 

67.16 

1.83 



64 

1.72 

6.93 

38.52 

3.19 



128 

1.48 

8.05 

28.50 

4.31 

Silicon Graphics 

Oct 94 

1 

39.0 

0.31 

NA 

NA 

Power Challenge XL 


2 

16.9 

0.71 

NA 

NA 

(75 MHz) 


4 

7.2 

1.66 

NA 

NA 



8 

4.5 

2.65 

NA 

NA 



16 

3.5 

3.41 

NA 

NA 
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Table 5: Results of the 3-D FFT PDE (FT) benchmark. 


Computer System 

Date 

Received 


Class A 

Class B 

No. Proc 

Time in 
Second 

Ratio to 
Cray YMP/1 

Time in 
Seconds 

Ratio to 
Cray C90/1 

Convex Exemplar 

Mar 95 

1 

178.6 

0.16 

NA 

NA 

SPP1000 


8 

25.5 

1.13 

375.4 




16 

20.5 

1.40 

NA 

NA 



32 

13.9 

2.07 

NA 

NA 

CRAY C90 

Feb 95 

1 

8.95 

3.21 

110.60 

1.00 



2 

4.53 

6.35 

55.75 

1.98 



4 

2.29 

12.56 

27.95 

3.96 



8 

1.29 

22.30 

14.12 

7.83 



16 

0.80 

35.97 


14.46 

CRAY J90 

Feb 95 

1 

42.84 

0.67 

ma 

NA 



2 

22.08 

1.30 


NA 



4 

11.21 

2.57 


NA 



8 

6.15 

4.68 

wm 

NA 

CRAY T3D 

Feb 95 

16 

11.80 

2.44 

NA 

NA 



32 

5.90 

4.87 

NA 

NA 



64 

2.99 

9.62 

40.57 

2.73 



128 

1.52 

18.93 

20.68 

5.35 



256 

0.77 

37.36 

10.77 

10.27 



512 

0.51 

56.41 

6.44 

17.17 



1024 

0.32 

89.91 

3.76 

29.41 

CRAY Y-MP 

Feb 95 

1 



NA 

NA 



8 

■B9 

6.87 

NA 

NA 

CRAY T90 

Feb 95 

1 

5.23 

5.50 

NA 

NA 

Fujitsu VPP50O 

Aug 94 

4 

2.93 

9.82 

NA 

NA 



8 

1.45 

19.84 

NA 

NA 



16 

0.75 

38.36 

7.95 

13.91 



32 

0.40 

71.93 

4.07 

27.17 



64 

0.24 

119.88 

2.18 

50.73 

IBM SP2-WN 

Oct 94 

8 

13.31 

2.16 

NA 

NA 

(Wide Nodes) 


16 

7.17 

4.01 

91.8 

1.20 



32 

3.96 

7.27 

47.23 

2.34 



64 

2.19 

13.4 

26.05 

4.25 



128 

1.23 

23.39 

14.52 

7.62 

IBM SP2-TN2 

Mar 95 

8 

14.78 

1.95 

NA 

NA 

(Thin Nodes 2) 


16 

8.09 

3.56 

101.03 

1.09 



32 

4.31 

6.68 

51.38 

2.15 



64 

2.39 

12.04 

28.02 

3.95 



128 

1.30 

22.13 

15.68 

7.05 

Silicon Graphics 

Oct 94 

1 

61.17 

0.47 

761.67 


Power Challenge XL 


2 

35.53 

0.81 

414.52 

0.27 

(75 MHz) 


4 

19.98 

1.44 

223.97 

0.49 



8 

12.57 

2.29 

130.15 

0.85 



16 

11.18 

2.57 

110.37 

1.00 


10-16 




















































Table 6: Results of the Integer Sort (IS) benchmark. 


Computer System 

Date 

Received 

Number 

Processor 

Class A 

Class B 

Time in 
seconds 

Ratio to 
Cray Y-MPA 

Time in 
seconds 

Ratio to 
Cray C90/1 

Convex Exemplar 

Mar 95 

1 

83.2 

0.14 

NA 

NA 

SPP1000 


8 

10.1 

1.13 

43.5 


CRAY C90 

Feb 95 

1 

3.33 

3.44 

12.92 

1.0 



2 

1.64 

6.99 

6.50 

1.99 



4 

0.85 

13.48 

3.30 

3.92 



8 

0.46 

24.91 

1.73 

7.47 



16 

0.27 

42.44 

0.98 

13.18 

CRAY J90 

Feb 95 

1 

13.75 

0.83 

NA 

NA 



2 

7.02 

1.63 

NA 

NA 



4 

3.81 

3.00 

NA 

NA 



8 

2.21 

5.19 

NA 

NA 

CRAY T3D 

Feb 95 

16 

7.07 

1.62 

NA 

NA 



32 

3.89 

2.95 

16.57 

0.78 



64 

2.09 

5.48 

8.74 

1.48 



128 

1.05 

10.91 

4.56 

2.83 



256 

0.55 

20.84 

2.36 

5.47 



512 

0.31 

36.97 

1.33 

9.71 



1024 

0.44 

26.05 

1.22 

10.59 

CRAY T90 

Feb 95 

1 

2.06 

5.56 

NA 

NA 

CRAY Y-MP 

Aug 92 

1 

11.46 

1.00 

NA 

NA 



8 

1.85 

6.19 

NA 

NA 

Fujitsu VPP500 

Apr 94 

1 

2.189 

5.24 


NA 



2 

1.574 

7.28 


NA 



4 

1.098 

10.44 


3.49 



8 

0.917 

12.50 

IBEl 

4.26 

IBM SP2-WN 

Mar 95 

8 

4.93 


19.75 

0.65 

(Wide Nodes) 


16 

2.65 


10.60 

1.22 



32 

1.54 

7.44 

5.92 

2.18 



64 

0.89 

12.88 

3.41 

3.79 



128 

0.59 

19.42 

1.98 

6.53 

IBM SP2-TN2 

Feb 95 

8 

5.16 

2.22 

20.79 

0.62 

(Thin Nodes 2) 


16 

2.89 

3.97 

11.46 

1.13 



32 

1.66 

6.90 

6.37 

2.03 



64 

0.91 

12.59 

3.58 

3.61 



128 

0.61 

18.79 

2.05 

6.30 


11-16 

































































Table 7: Results of the LU CFD Application (LU)benchmark. 


Computer System 

Date 

Received 

No. Proc. " 

Convex Exemplar 
SPP1000 

Mar 95 

1 

8 

16 

32 

CRAY C90 

Feb 95 

1 

2 

4 

8 

16 

CRAY J90 

Feb 95 

1 

2 

4 

8 

CRAY T3D 

Feb 95 

16 

32 

64 

128 

256 

512 

1024 

CRAY T90 

Feb 95 

1 

CRAY Y-MP 

Aug 92 

1 

8 

Fujitsu VPP500 

Aug 94 

1 

IBM SP2-WN 
(Wide Nodes) 

Mar 95 

8 

16 

32 

64 

128 

IBM SP2-TN2 
(Thin Nodes 2) 

Mar 95 

8 

16 

32 

64 

128 

Silicon Graphics 
Power Challenge XL 
(75 MHz) 

Oct 94 

1 

4 

8 

16 


Class A 

Class B 

Time in 

Ratio to 

Time in 

Ratio to 

Seconds 

Cray YMP/1 

Seconds 

Cray C90/1 

2668 

0.13 

NA 

NA 

331 

1.00 

1492 

0.30 

196 


827 

0.54 

126 


465.9 

0.96 

119.78 

2.78 

449.54 

1.00 

62.29 

5.35 

231.98 

1.94 

32.20 

10.36 

121.26 

3.71 

17.15 

19.45 

63.03 

7.13 

10.17 

32.79 

37.93 

11.85 

495.22 

0.67 

NA 

NA 

260.58 

1.28 

NA 

NA 

138.99 

2.40 

NA 

NA 

77.70 

4.29 

NA 

NA 

205.69 

1.62 

844.53 

0.53 

106.89 

3.12 

451.18 

1.00 

55.32 

6.03 

233.45 

1.93 

28.71 

11.62 

120.53 

3.73 

15.94 

20.92 

65.06 

6.9 

9.02 

36.97 

36.39 

12.35 

7.09 

47.4 

20.77 

21.64 

82.67 

4.03 

NA 

NA 

333.5 

1.00 

NA 

NA 

49.5 

6.74 

NA 

NA 

146.89 

2.27 

591.05 

0.76 

112.5 

2.96 

429.8 

1.05 

64.6 

5.16 

234.4 

1.92 

36.5 

9.14 

129.7 

3.47 

22.7 

14.69 

76.8 

5.85 

15.2 

21.94 

47.8 

9.41 

120.8 

2.76 

477.3 

0.94 

70.9 

4.70 

255.4 

1.76 

40.1 

8.32 

141.3 

3.18 

24.5 

13.61 

82.9 

5.42 

15.9 

20.97 

51.2 

8.78 

604.0 

0.55 

2617.9 

0.17 

231.8 

1.44 

1010.5 

0.44 

111.7 

2.99 

550.2 

0.82 

65.3 

5.11 

308.1 

1.46 


12-16 








































































Table 8: Results of the SP simulated CFD application (SP) benchmark. 


Computer System 

Date 

Received 


Class A 

Class B 

No. Proc. 

Time in 
seconds 

Ratio to 
Cray YMP/1 

Time in 
seconds 

Ratio to 
Cray C9(V1 

Convex Exemplar 

Mar 95 

1 

2533 

0.19 

NA 

NA 

SPP1000 


8 

345 

1.37 

1584 

0.44 



16 

228 

2.07 

1068 

0.65 



32 

144 

3.27 

697.4 

0.99 



64 

102 

4.62 

449.5 

1.5 

CRAY C90 

Feb 95 

1 

174.50 

2.70 

689.60 

1.00 



2 

87.32 

5.40 

345.57 

2.00 



4 

44.75 

10.54 

1 75.85 

3.92 



8 

22.74 

20.73 


7.59 



16 

12.82 

36.78 


13.21 

CRAY J90 

Feb 95 

1 

871.34 

0.54 


mmm 



2 

445.25 

1.06 





4 

232.43 

2.03 





8 

128.711 

3.66 


■■9 

CRAY T3D 

Feb 95 

16 

202.11 

2.33 

818.07 

0.84 



32 

104.10 

4.53 

463.62 

1.49 



64 

53.26 

8.85 

233.52 

2.95 



128 

27.54 

17.12 


5.29 



256 

14.71 

32.05 

74.89 

9.21 



512 

8.91 

52.92 

42.63 

16.18 



1024 

5.41 

87.15 

25.23 

27.33 

CRAY T90 

Feb 95 

1 

114.78 



NA 

CRAY Y-MP 

Aug 92 

1 

471.5 



NA 



8 

64.6 



NA 

DEC Alpha Server 

Mar 95 

1 

749.61 

0.63 


0.20 

8400 5/300 


4 

199.17 

2.37 

904.45 

0.76 



8 

118.04 

3.99 

452.13 

1.53 



12 

102.75 

4.59 

364.54 

1.89 

Fujitsu VPP500 

Mar 95 

1 

99.309 

4.75 

404.08 

1.71 



2 

61.588 

7.66 

241 .23 

2.86 



4 

32.114 

14.68 

127.48 

5.41 



6 

NA 

NA 

83.710 

8.24 



8 

16.399 

28.75 

64.930 

10.62 



16 

8.5761 

54.98 

NA 

NA 



17 

NA 

NA 

30.474 

22.63 



32 

4.5355 

103.96 

NA 

NA 



34 

NA 

NA 

15.674 

44.0 



51 

NA 

NA 

10.654 

64.73 



64 

2.5483 

185.0 

NA 

NA 

IBM SP2-WN 

Mar 95 

8 

143.8 

3.27 

589.3 

1.17 

(Wide Nodes) 


16 

83.2 

5.67 

300.6 

2.29 



32 

48.7 

9.68 

163.8 

4.21 



64 

30.1 

15.66 

91.7 

7.52 



128 

18.7 

25.21 

54.8 

12.58 

IBM SP2-TN2 

Mar 95 

8 

161.1 

2.93 

640.9 

1.08 

(Thin Nodes 2) 


16 

93 3 

5.05 

342.3 

2.01 



32 

53.6 

8.80 

184.4 

3.74 



64 

32.7 

14.42 

101.6 

6.79 



128 

20.6 

22.89 

59.9 

11.51 

Silicon Graphics 

Oct 94 

1 

858.3 

0.55 

3719.5 

0.19 

Power Challenge XL 


4 

225.8 

2.09 

947.6 

0.73 

(75 MHz) 


8 

119.5 

3.95 

491.4 

1.40 



16 

67.2 

7.02 

313.1 

2.20 
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Table 9: Results of the BT simulated CFD application (BT) benchmark, 


Computer System 

Date 

Received 


Class A 

Class B 

Number 

Processor 

Time in 
seconds 

Ratio to 
Cray Y-MP/1 

Time In 
seconds 

Ratio to 
Cray C9G/1 

Convex Exemplar 

Mar 95 

1 

2825 

0.28 

NA 

NA 

SPP1000 


8 

366 

2.17 

1675 

0.61 



16 

211 

3.76 

984 

1.04 



32 

125 

6.34 

559.8 

1.82 



64 

78 

10.16 

338.2 

3.03 

CRAY C90 

Feb 95 

1 

276.80 

2.86 

1023.4 

MO 



2 

139.44 

5.68 

519.46 

1.97 



4 

72.11 

10.99 


3.86 



8 

36.99 

21.42 

138.16 

7.41 



16 

20.30 

39.03 

78.80 

12.99 

CRAY J90 

Mar 95 

1 

1209.64 

0.66 

NA 

NA 



2 

624.05 

1.27 

NA 

NA 



4 

324.73 

2.44 

NA 

NA 



8 

178.06 

4.45 

NA 

NA 

CRAY T3D 

Feb 95 

16 

230.41 

3.44 

918.04 

1.11 



32 

115.53 

6.85 

476.97 

2.15 



64 

59.01 

13.43 

252.86 




128 

29.96 

26.44 

128.21 

7.98 



256 

15.89 

49.87 

68.38 

15.0 



512 

8.39 

94.45 


26.92 



1024 

4.56 

173.77 

20.45 

50.04 

CRAY T90 

Feb 95 

i 

193.19 

4.10 

NA 

NA 

CRAY Y-MP 

Aug 92 

1 


1.00 

NA 

NA 



8 

1 

6.95 

NA 

NA 

DEC Alpha Server 

Mar 95 

l 

1113.90 

0.71 

4076.50 

0.25 

8400 5/300 


2 

551.80 

1.44 

2525.00 

0.41 



4 

286.97 

2.76 

1278.60 

0.80 



8 

146.91 

5.39 

649.53 

1.58 



12 

103.47 

7.66 

458.21 

2.23 

Fujitsu VPP500 

Mar 95 

1 

142.42 

5.56 

NA 

NA 



2 

75.17 

10.54 

NA 

NA 



4 

39.14 

20.25 

NA 

NA 



8 

19.82 

39.98 

NA 

NA 



16 

9.99 

79.32 

NA 

NA 



17 

NA 

NA 

37.26 

27.47 



32 


155.68 

NA 

NA 



34 


NA 

18.82 

54.38 



51 


NA 

12.61 

81.16 



64 


297.90 

NA 

NA 

IBM SP2-WN 

Mar 95 

8 

206.7 

3.83 

862.8 

1.19 

(Wide Nodes) 


16 

112.9 

7.02 

440.6 

2.32 



32 

61.8 

12.82 

226.8 

4.51 



64 

34.7 

22.84 

119.1 

8.59 



128 

20.1 

39.42 

67.0 

15.27 

IBM SP2-TN2 

Feb 95 

8 

216.6 

3.66 

889.8 

1.15 

(Thin Nodes 2) 


16 

118.0 

6.72 

459.2 

2.23 



32 

64.9 

12.21 

237.2 

4.31 



64 

36.3 

21.83 

124.8 

8.20 



128 

20.8 

38.10 

69.6 

14.70 

Silicon Graphics 

Oct 94 

1 

1330.3 

0.60 

5698.7 

0.18 

Power Challenge XL 


4 

355.9 

2.23 

1450.0 

0.71 

(75 MHz) 


8 

177.0 

4.48 

775.0 

1.32 



16 

91.8 

8.63 

426.0 

2.40 


14- 16 






































































Table 10: Approximate sustained performance per dollar for Class B LU benchmark. 


Computer System 

# Proc 

Memory 

Ratio to 
C90/1 

List Price 
Million $ 

Performance 
per Million $ 

Date 

Convex SPP1000 

32 

4 GB 

0.96 

1.25 

0.77 

Mar 95 

CRAY C90 

16 

2 GB 

11.85 

30.50 

0.39 

Mar 95 

CRAY T3D 
No front end 

128 

64 MB/PE 

3.73 

3.6 

1.04 

Mar 95 

IBM SP2-WN 

64 

128 MB/PE 

5.85 

5.94 

0.98 

Mar 95 

IBM SP2-TN2 

64 

64 MB/PE 

5.42 

4.30 

1.26 

Mar 95 

SGI PC-XL (75 MHz) 

16 

2 GB (total) 

1.46 

1.02 

1.43 

Jun 94 


Table 11: Approximate sustained performance per dollar for Class B SP benchmark. 


Computer System 

# Proc 

Memory 

Ratio to 
C90/1 

List Price 
Million $ 

Performance 
per Million $ 

Date 

Convex SPP1000 

64 

8 GB 

1.5 

2.50 

0.60 

Mar 95 

CRAY C90 

16 

2 GB 

13.21 

30.50 

0.43 

Mar 95 

CRAY T3D 
No front end 

128 

64 MB/PE 

5.29 

3.6 

1.47 

Mar 95 

DEC Alpha Server 
8400 5/300 

8 

256 MB/PE 

1.53 

0.42 

3.64 

Mar 95 

Fujitsu VPP500 

51 

256 MB/PE 

64.73 

31.00 

2.09 

Mar 95 

IBM SP2-WN 

64 

128 MB/PE 

7.52 

5.94 

1.27 

Mar 95 

IBM SP2-TN2 

64 

64 MB/PE 


4.30 

1.58 

Mar 95 

SGI PC-XL (75 MHz) 

16 

2 GB (total) 

2.20 

1.02 

2.15 

Jun 94 


Table 12: Approximate sustained performance per dollar for Class B BT benchmark. 


Computer System 

# Proc 

Memory 

Ratio to 
C90/1 

List Price 
Million $ 

Performance 
per Million $ 

Date 

Convex SPP1000 

64 

8 GB 

3.03 

2.50 

1.21 

Mar 95 

CRAY C90 

16 

2 GB 

12.99 

30.50 

0.43 

Mar 95 

CRAY T3D 
No front end 

128 

64 MB/PE 

7.98 

3.6 

2.22 

Mar 95 

DEC Alpha Server 
8400 5/300 

8 

256 MB/PE 

1.58 

0.42 

3.76 

Mar 95 

Fujitsu VPP500 

51 

256 MB/PE 

81.16 

31.00 

2.62 

Mar 95 

IBM SP2-WN 

64 

128 MB/PE 

8.59 

5.94 

1.45 

Mar 95 

IBM SP2-TN2 

64 

64 MB/PE 

8.20 

4.30 

1.91 

Mar 95 

SGI PC-XL (75 MHz) 

16 

2 GB (total) 

2.40 

1.02 

2.35 

Jun 94 
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